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A method, a device and a system for compressing a musical and 

voice signal 

5 BACKGROUND OF THE INVENTION 

The invention relates to a method, a device and a system for 
compressing a musical and voice signal. 

10 In today's advance in digital communication technology, 

transmission of data across the Internet, mobile technology 
has made information available to the user almost immediately 
even over decentralized communication networks such as the 
Internet . 

15 

This technology has also shaped the way people enjoy 
themselves . 

In the audio entertainment field, a user nowadays usually 
20 expects audio enjoyment in an on-demand basis. The Internet 
has served as a very useful highway to transport and 
distribute musical and" voice signals to the user anywhere and 
anytime, 

25 The Internet phenomena are only at its infancy and it has 

experienced enormous growth/ Even then, the increasing number 
of users and new applications entering the Internet need a 
bandwidth which goes far beyond the bandwidth which is 
currently available from communication networks. 

30 

Compression technology is therefore the topic of focus, which 
would reduce the bandwidth requirement for the transmission 
of data in general, in particular for the transmission of 
musical and voice signals. 



35 
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Using compression of data, to the user, it would mean a 
shorter time to download the data, a need of smaller storage 
space and therefore saved money and time, 

5 MP3 (Motion Picture Expert Group (MPEG) Audio Layer 3) was 
therefore being popular adopted by the industry as the de- 
facto standard for transmission of audio data across the 
Internet . 

10 MP3 provides a compression ratio of about 10 times over 

uncompressed data for CD quality audio (44.1 kHz * 16 bit/s) . 
To transmit "a three minute lasting uncompressed song as a 
musical and voice signal across the Internet would take 
(44.1 * 16kbps * 2 channels * 3 min * 60 sec) / 56kbps = 

15 453 6 sec = 75.6 min . which is more than an hour. 

MP3 would reduce that to only 7.56 minutes, which is an 
amazing feat . 

20 However, to transmit an album of 10 MP3 songs as musical and 
voice signals would again take more than an ^hour. Therefore, 
a compression method of more than 10 times would be desirable 
if Internet music becomes a reality. 

25 As shown in Figol, professional music is usually recorded in 
a studio within a soundproof room. 

The sound from the musical instruments, also referred to as 
musical signals 101 and vocals, i.e. speech signals, also 
-30 referred to as voice signals 102, are recorded on separate 
tracks . 

If the data (comprising the analog musical signals 101 and 
the voice signals 102) is to be compressed using a digital 
35 method, the analog signal is first converted to a digital 

form through an analog to digital conversion device, i.e. the 
analog musical signals 101 are converted into digitized 



WO 02/05433 



PCT/SG01/00144 



3 

musical signals 103 and the analog voice signals 102 are 
converted into digitized voice signals 104 . 

The separate signals are then mixed down (through a mixer) 
5 onto a master track (the audio signal), which is symbolized 
by block 105 in Fig.1, which master track forms the 
compression source for most compression methods (including 
MP3). 

10 MP3 audio compression belongs to a class of data compression 
schemes called perceptual coding. 

This is based on the sub band / transform coding technique. 
Perceptual coding analyses the frequency and amplitude 
15 content of the input signal, and compares it to a model of 
human auditory perception. 

Information that is audible is coded and everything that is 
inaudible can be discarded. 

20 

The advantage of sub band / transform coding is that it works 
in the frequency domain. The uncorrelated nature of the 
spectral components makes it possible to quantise the 
spectral components in different frequency bands with a 
25 different number of bits, provided that the resulting 
quantization noise is unperceived. 

This advantage is further exploited by MP3 using the masking 
phenomenon of the human auditory system. The MP3 encoder 
3 0 analyses the frequency and amplitude content of ..the input, 
audio signal and compares it to a psychoacoustics model of 
the human auditory system. 



35 



Alternative forms of audio compression include ADPCM 
(Adaptive Delta Pulse Code Modulation) , wavelet compression, 
etc . 
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After the audio signal is compress (step 106) , the compressed 
data are stored into a storage device (step 107) # e.g. a hard 
disk, a CD-Rom or a semiconductor device like a Flash-Memory 
or a Read-Only-Memory (ROM) . The data could also be stored 
5 • into a server computer where it would, be transmitted over a,, 
transmission line (such as the Internet) to a user on-demand 
and stored within the user's storage device in a user's 
client computer. 

10 When the user wishes to listen to the piece of an audio 

signal, the compressed audio data is decompressed (step 108) 
and outputted to a digital -to- analog device (step 109), with 
the analog signal driving a loudspeaker producing music for 
listening pleasure. 

15 

SUMMARY OF THE INVENTION 

An object of the invention is to compress a musical and voice 
20 signal with an improved compression ratio. 

The object is achieved with a method, a device and a system 
for compressing a musical and voice signal with the features 
according to the independent claims. 

25 

In a method for compressing' a musical and voice signal, which 
musical and voice signal comprises a musical signal and a 
voice signal, the sound from the musical instruments, also 
referred to as musical signal and vocals, i.e. speech signal, 
30 also referred to as voice signal 102, are recorded on 
separate tracks. 

The analog musical signal is then converted into a digitized 
musical signal and the analog voice signal is converted into 
35 a digitized voice signal. 
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For the digitized musical signal, notes parameters of the 
musical signal are determined. In this context, notes 
parameters are e.g. 

o the fundamental frequency of the notes of the musical 
5 signal, and/or 

o the amplitude of the musical signal, and/or 
o the type of instrument or instruments, which are 
involved in generating the musical signal. 

10 The fundamental frequency in this context is the frequency, 
with which the notes of the reconstructed signal will later 
be played, ' 

For the digitized voice signal, a compressed digitized voice 
15 signal is generated, using e.g. a speech recognition 

algorithm or a Linear Prediction Coding algorithm (LPC) . 

The determination of the notes parameters and the compression 
of the digitized voice signal are executed independently from 
20 each other. • 

In a last step, the musical notes parameters are stored 
together with the compressed voice signal in a memory, so 
that it is possible to restore and decompress the musical 
25 notes parameters and the compressed voice signal, thereby 
generating a synthesized musical and voice signal. 

" The invention provides a much higher compression rate than 
the known compression algorithm. The compression rate is even 
3 0 improved when using the speech recognition algorithm, e.g. 

using Hidden Markov Models, for compressing the voice signal. 

The stored musical notes parameters and compressed voice 
signal may be transmitted from a server computer over a 
35 communication network, e.g. via the Internet to a client 
computer^ where it is restored and decompressed, thereby 
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generating the synthesized musical and voice signal, which is 
presented to a user of the client computer. 

Alternatively, the compressed data may be stored into a 
5 storage device (step 107), e.g. a hard disk, a CD-Rpm or a 
semiconductor device like a Flash-Memory or a ROM (Read-Only- 
Memory) and restored and decompressed from that respective 
storage device. 

10 When using the speech recognition algorithm for compressing 
the speech signal, the restoring of the compressed voice 
signal may comprise the step of text -phoneme -converting of 
the compressed voice signal into a speech synthesis signal, 
which is used for generating the synthesized musical and 

15 voice signal. 

Furthermore, a device for compressing a musical and voice 
signal comprises a processing unit for executing the above 
mentioned steps. 

20 . 
Thus, the device includes e.g. I 
o a musical notes determination unit for ' determining 

musical notes parameters of the musical signal, 
o a voice signal compression unit for compressing the 
25 voice signal independently from the musical signal, and 

o a memory for storing tfie musical notes parameters 

together with the compressed voice signal, so that it is 
possible to restore and decompress the musical notes 
parameters and the compressed voice signal, thereby 
3 0 generating- -a synthesized musical and voice signal, the 

memory being connected to the musical notes 
determination unit and the voice signal compression 
unit . 



35 



Furthermore, a system for compressing and decompressing a 
musical and voice signal comprises a processing unit for 
executing the above mentioned steps. 
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Thus, the system includes e.g. 

o a musical notes determination unit for determining 
musical notes parameters of the musical signal, 
50 a voice signal compression unit for compressing the 
voice signal independently from the musical signal, 

o a memory for storing the musical notes parameters 

together with the compressed voice signal, so that it 
possible to restore and decompress the musical notes 
10 parameters .and the compressed voice signal, thereby 

generating a synthesized musical and voice signal, the 
memory 'being connected to the musical notes 
determination unit and the voice signal compression 
unit, and 

15 o a musical and voice signal synthesizing unit for 

restoring and decompressing the musical notes parameters 
and the compressed voice signal, thereby generating a 
synthesized musical and voice signal. 

20 The invention may be implemented using a special electronic 
circuit, i.e. in hardware, or using computer programs, i.e. 
in software. 

25 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a block diagram showing an example of a method 
for compressing a musical and voice signal; 

3 0 Figure 2 is a block diagram showing a model of human speech 
production; 

Figure 3 is a block diagram showing an LPC voice coding unit, 
also referred to as a vocoder; and 

35 
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Figure 4 is a block diagram showing a system and a method for 
compressing a musical and voice signal according to 
a preferred embodiment of the invention. 

5 

DESCRIPTION OF PREFERRED EMBODIMENTS. OF THE INVENTION 

Preferred embodiments of the invention and modifications 
thereof will now be described with reference to the 
10 accompanying drawings . 

According tc5 the embodiments of the invention, an improved 
compression ratio is achieved by synthesizing an audio 
signal, i.e. a musical and voice signal, instead of modeling 
15 it. 

In order to properly synthesize the audio signal, the 
complete model of the instrument and vocal cord is required. 

20 Music synthesizing has been available e.g. by equipments 
which could synthesize (music synthesizer) musical 
instruments. Such a synthesizer has been provided by a 
standard keyboard input and it produces musical output from a 
musical notes. 

25 

Such a synthesizer e.g. uses a Wavetable method by according 
all the notes from a musical instrument and stores it into a 
semiconductor storage (ROM) . Given the instrument, notes an9 
velocity (the information about how hard and how fast the key 
30 of- the keyboard is pressed), the particular musical notes can 
be played. 

Although popular in recording audio signals, it should be 
mentioned that music synthesis has never been used as a 
35 compression methodology according to the state of the art. 
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Furthermore, voice can be synthesized using a text-to- speech 
generating method. 

Furthermore, it is known to extract the vocal parameters to 
5 mimic a person's voice- Since a human is quite perceptive to 
a singer voice, a compression method that models the general 
vocal code would be sufficient. and will be described instead. 

The vocal compression according to a embodiment of the 
10 invention uses a method called Linear Predictive Coding 
(LPC) . 

According to the LPC, the way, how the human speech is 
generated, is modeled. 

15 

Speech is produced by cooperation of lungs, glottis (with 
vocal cords) and articulation tract (mouth and nose cavity) . 

For the production of voiced sounds, the lungs press air 
20 through the epiglottis, the vocal cords vibrates; they 
interrupt the air stream and produce a quasi -periodic 
pressure wave . 

In the case of unvoiced sounds, the excitation of the vocal 
25 tract is more noise-like. 

A model 200 illustrates the human speech production, as shown 
in Fig. 2 . 

■30 The lungs are modeled by a DC source 201 , the vocal cords by 
an impulse generator 202 and the articulation tract by a 
linear filter system 203. A noise generator 204 produces the 
unvoiced excitation. Speech sounds 205. consist of both voiced 
and unvoiced signals mixed together. 

35 

A great advantage of an LPC coder is the manipulation 
facilities and the narrow analogy to human speech. By 



WO 02/05433 



PCT/SGO 1/00 144 



10 

manipulating the parameters of the LPC vocoder, it is for 
example possible to transform a male voice into a female 
voice or a child voice. An LPC vocoder can be used as the 
engine for the text-to-speech synthesis, which will be 
5 described later in detail . 

Fig. 3 shows a block diagram of an LPC vocoder 3 00. 

The first step is to perform an LPC and speech analysis on 
10 the digital voice data, i.e. an LPC analysis (block 301) and 
a pitch analysis (block 302) . 

Both sets of the determined LPC coefficients 303 and the 
determined pitch values 304 are then stored in the parameter 
15 memory (block 305) . These parameters are then used to control 
the synthesis part of the- LPC vocoder 300. 

In other words, the stored parameters a fed into a pitch 
generator 306, which generates reconstructed pitch values 307 

20 and into a digital filter 308. Furthermore, noise signals 310 
are generated, by a noise generator 309. The ^reconstructed 
pitch values 3 07 and the noise signals 310 are amplified 
(block 311) and the amplified signals 312 are fed into the 
digital filter 308, thereby generating a reconstructed voice 

25 signal 313. 

In this context, it should be mentioned that the LPC 
compression can only be used for human speech compression, 
i.e. for compressing a voice signal. It is not suitable for 
30 compression of a musical signal. The compression ratio 

achieved by the LPC is much higher than any audio compression 
(MP3, ADPCM or Wavelet) so far. 

Fig ,4 shows a system for compressing and decompressing a 
3 5 musical and voice signal according to a preferred embodiment 
of the invention. 
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The system 400 comprises a server computer 401 and a 
plurality of client computers 402, one of them being shown in 
Pig. 4. 

5 The respective steps which are executed .during the method are 
symbolized as blocks in the server computer 401 and the 
client computer 402, respectively. 

The server computer 401 and the client computer 402 are 
10 connected to each other via the Internet 403 as a 
communication network. 

As shown in Fig. 4, an analog musical signal 404 and an analog 
voice signal 405 are recorded on separate tracks using a 
15 microphone (not shown) . 

The analog musical signal 404 and analog voice signal 405 are 
converted into a digital musical signal 406 and a digital 
voice signal 407 signal using an analog to digital conversion 
20 device. 

The digital signal from the musical instrument, i.e. the 
digital musical signal 406 is fed into a frequency analyzer, 
which determines the fundamental frequency of the notes 
25 played. The amplitude and the type of instrument are also 
recorded . 

In order to determine these parameters, the digital musical*- 
signal 406 is transformed from the time domain to the 

30 frequency domain. The fundamental frequency is selected and 
its amplitude is noted, i.e. stored. The fundamental 
frequency is the frequency, with which the noted will be 
played. The frequency and the amplitude are recorded as 
described in the General MIDI standard. The frequency is 

35 respectively stored as the notes. The amplitude is stored as 
the velocity. According to this embodiment, the determined 
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values are normalized to fit in the required predetermined 
range . 

Together, the fundamental frequency of the notes played, the 
5 amplitude and the type of instrument form the. musical notes 
parameters (block 408) . 

The digital voice signal 407 is fed into an LPC vocoder 409 
The LPC vocoder 409 determines the LPC coefficients as 
10 described above * thereby generating a compressed voice signal 
411. 

A speech recognition can alternatively be used to replace the 
LPC. When using a speech recognition algorithm, Hidden Markov 
15 Models may be used. 

The musical notes parameters 410 and the compressed voice 
signal 411 is multiplexed and stored in storage device of the 
server computer 4 01 (block 412) , alternatively on any other 
2 0 storage medium such as a CD-ROM. 

The term "multiplexed" is to be understood in the sense that 
a rather small portion of the musical notes parameters 410 
and a rather small portion of the compressed voice signal 411 
25 are loaded into a small memory space sufficient to store 

those two portions, which respectively form a sub portion of 
the whole musical notes parameters 410 and compressed voice 
signal 411. 

30 With this optional feature, it is possible to reduce the 
required memory space in the client computer, which is 
especially advantageous if the client computer is a cheap and 
rather low- end device such as a mobile phone or a PDA having 
an audio player, with which it is possible to reconstruct and 

35 play the reconstructed audio signal. 
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Another advantage of the storing of a small portion of the 
musical notes parameters 410 and a small portion of the 
compressed voice signal 411 together is that in this case it 
is not necessary to transmit the entire musical notes 
5 parameters 410 and the entire compressed voice signal 411 
before beginning to reconstruct and play the audio signal, 
i.e. the song. This is particularly advantageous when using a 
rather slow communication network such as the Internet 403 w 
using a slow telephone modem line between the server computer 
10 401 and the client computer 402. 

The data 413 is transmitted across the Internet 403 on an on- 
demand basis. 

15 The received data 413 is then stored within the client 
computer 402 (block 414) . 

When the user of the client computer 4 02 wishes to listen to 
a piece of music, the compressed data 413 is extracted and 
20 decompressed e.g. in real-time. 

In other words, the stored musical notes parameters 410 are 
extracted (block 415) and a decompressed digital musical 
signal 416 is generated using the Wavetable method used in a 
25 usual keyboard synthesizer. 

Furthermore, the stored compressed voice signal 411 is 
decompressed (block 417) and a decompressed digital voice 
signal 418 is generated. 

30 

When using the LPC, the decompressed digital voice signal 418 
is generated in the way described with reference to the LPC 
vocoder of Fig. 3. 

35 In general, text-to-speech conversion is used for the 

synthesis of the digital voice signal 418. This means that a 
stored dictionary of text and corresponding phonemes is used. 
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Each phoneme has a corresponding voice. The information and 
stress of the voice are adjusted based on the particular 
context of the reconstructed digital voice signal 418, The 
information and the stress may be provided by the melody of 
5 the digital musical signal 410 using the note pitch and its 
amplitude. 

When using the speech recognition algorithm and the 
corresponding text-to-speech conversion will usually not 
10 generate the sound of the original singer. However, using the 
speech recognition algorithm provides the higher compression 
ratio. 

The "raw" musical signals and voices signals are combined 
15 either by digital or analog means. 

For analog combination, a digital-to-analog conversion 
process will convert the digital signals to analog signals. 
In other words/ the decompressed digital musical signal 416 
20 is converted into a decompressed analog musical signal 420 
(block 419). The reconstructed digital voice 1 signal 418 is 
converted into a reconstructed analog voice/ signal 422 
(block 421) . 

25 The analog musical signal 420 and the analog voice signal 422 
are then combined through a' summing operational amplifier, 
i.e. a mixer 423, thereby generating a reconstructed analog 
musical and voice signal 424. 

30 The analog musical and voice signal. 424 is. output to a power 
amplifier 425 and the thereby generated amplified analog 
musical and voice signal 426 is used to drive a speaker in 
order to generate the audio signal 427 output to the user of 
the client computer 402. 
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CLAIMS 



What is claimed is: 

5 1. A method for compressing a musical and voice signal, 

comprising a musical signal and a voice signal, which method 
comprises the following steps: 

o determining musical notes parameters of the musical 
signal, 

10 o compressing, the voice signal independently from the 

musical signal, and 
o storing- the musical notes parameters together with the 

compressed voice signal, so that it is possible to 

restore and decompress the musical notes parameters and 
15 the compressed voice signal, thereby generating a 

synthesized musical and voice signal. 

2. The method according to claim 1, wherein 

o the musical signal is a digitized musical signal, and 
20 o the voice signal is a digitized voice signal. 

3. The method according to claim 1 or 2 , wherein 

o the fundamental frequency of the notes of the musical 
signal, and/or 
25 o the amplitude of the musical signal, and/or 

o the type of instrument 'or instruments, which ar^ 

involved in generating the musical signal, 
are determined as the musical notes parameters. 

30 4. The method according to any one of the claims 1 to 3 , 
wherein the compressing of the digital voice signal is 
performed using a linear prediction algorithm. 

5. The method according to any one of the claims 1 to 3, 
3 5 wherein the compressing of the digital voice signal is 
performed using a speech recognition algorithm. 
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6. The method according to claim 5, 

wherein the speech recognition algorithm is an algorithm 
using Hidden Markov Models. 

5 7. The method according to any one of the preceding claims, 
wherein the stored musical notes parameters and compressed 
voice signal are restored and decompressed, thereby 
generating the synthesized' musical and voice signal. 

10 8. The method according to claim 5 and 7, 

wherein the restoring of the compressed voice signal 
comprises the step of text -phoneme-converting of the 
compressed voice signal into a speech synthesis signal, which 
is used for generating the synthesized musical and voice 

15 signal, 

9. A device for compressing a musical and voice signal, 
comprising a musical signal and a voice signal, the device 
comprising: 

20 o a musical notes determination unit for determining 
musical notes parameters of the musical signal, 
o a voice signal compression unit for compressing the 

voice signal independently from the musical signal, and 
o a memory for storing the musical notes parameters 

25 together with the compressed voice signal, so that it is 

possible to restore and decompress the musical <yiotes 
parameters and the compressed voice signal, thereby 
generating a synthesized musical and voice signal, the* 5, 
memory being connected to the musical notes 

30 determination unit and the voice signal compression 

unit . 

10. A system for compressing and decompressing a musical and 
voice signal, comprising a musical signal and a voice signal, 

35 the system comprising: 

o a musical notes determination unit for determining 
musical notes parametiers of the musical signal, 
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o a voice signal compression unit for compressing the 
voice signal independently from the musical signal, 

o a memory for storing the musical notes parameters 

together with the compressed voice signal, so that it is 
5 possible to restore and decompress the musical notes 

parameters and the compressed voice signal, thereby 
generating a synthesized musical and voice signal, the 
memory being connected to the musical notes 
determination unit and the voice signal compression 

10 unit, and , 

o a musical and voice signal synthesizing unit for 

restoring and decompressing the musical notes parameters 
and the compressed voice signal, thereby generating a 
synthesized musical and voice signal. 

15 

11. A computer readable medium, having a program recorded 
thereon, where the program makes the computer execute a 
procedure comprising the following steps for compressing a 
musical and voice signal, comprising a musical signal and a 

20 voice signal: 

o determining musical notes parameters of the musical 

signal, 

o compressing the voice signal independently from the 
musical signal, and 
25 o storing the musical notes parameters together with the 
compressed voice signal, so that it is possible^to 
restore and decompress the musical notes parameters and 
the compressed voice signal, thereby generating a 
synthesized musical and voice signal . 

30 

12. A computer program element which makes the computer 
execute a procedure comprising the following steps for 
compressing a musical and voice signal, comprising a musical 
signal and a voice signal : 

35 o determining musical notes parameters of the musical 

signal, 
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o compressing the voice signal independently from the 
musical signal, and 

o storing the musical notes parameters together with the 
compressed voice signal, so that it is possible to 
restore and decompress the musical notes parameters and 
the compressed voice signal, thereby generating a 
synthesized musical and voice signal. 
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